Reference Manual’s Chapter on Expert Witness Testimony Admissibility – Part 4

In the district court, Judge George O’Toole conducted a pre-trial hearing over four days, and heard testimony from Smith and Cranor, as well as from defense expert witnesses. Judge O’Toole’s published opinion carefully and accurately stated the facts, the applicable law, and presented a well-reasoned judgment as to why Smith’s opinion was not admissible under Rule 702. Without admissible opinions on general causation to support Milward’s case, Judge O’Toole granted summary judgment to the defendants.

Milward appealed the judgment. A panel of judges in the First Circuit heard argument, and reversed in an opinion that is riddled with serious errors.[1] In reviewing the district court’s application of Rule 702, the panel, in an opinion written by Chief Judge Lynch, credulously accepted most of Smith’s and Cranor’s arguments that an ill-defined WOE approach is acceptable method of guiding scientific judgment. Cranor equated WOE, as used by Smith, to the approach that Sir Austin Bradford Hill described, in 1965, for identifying causal associations from epidemiologic data.[2] Chief Judge Lynch’s opinion tracked accurately Cranor’s and Milward’s lawyers’ misrepresentations about Sir Austin’s paper:

“Dr. Smith’s opinion was based on a ‘‘weight of the evidence’’ methodology in which he followed the guidelines articulated by world-renowned epidemiologist Sir Arthur [sic] Bradford Hill in his seminal methodological article on inferences of causality. See Arthur [sic] Bradford Hill, The Environment and Disease: Association or Causation?, 58 Proc. Royal Soc’y Med. 295 (1965).

Hill’s article explains that one should not conclude that an observed association between a disease and a feature of the environment (e.g., a chemical) is causal without first considering a variety of ‘viewpoints’ on the issue.”[3]

The quoted language from the First Circuit opinion, which twice refers to “Arthur Bradford Hill,” rather than Austin Bradford Hill, may suggest that neither Chief Judge Lynch nor his judicial colleagues and their law clerks read the classic paper. An even stronger indicator that the appellate court did not actually read this paper is evidenced in the court’s equating WOE to Bradford Hill viewpoints, without consideration of the necessary predicate for those nine viewpoints. In his short paper, Sir Austin clearly spelled out that there was a foundation needed before parsing the nine viewpoints:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[4]

Whatever Sir Arthur had to say about the matter, Sir Austin defined the starting point of causal analysis as an association free of invalidating bias and random error. The Milward decision ignored this all important predicate for assessing the various considerations that might allow for a valid association to be considered a causal association.[5] The resulting abridgement was a failure of scientific due process that distorted the Bradford Hill paper.

The First Circuit amplified its error when it asserted that from the nine considerations “no one type of evidence must be present before causality may be inferred.”[6] Although Sir Austin said something similar, one of the considerations he noted was “temporality,” in which the putative cause must come before the effect.  Most scientists would consider this consideration to be essential, unless they were observing events that were moving faster than the speed of light. The other eight considerations are more dependent upon context of the exposures and outcomes of interest, but surely strength and consistency of the clear-cut association across multiple studies is an extremely important consideration.

The First Circuit proceeds from misreading Sir Austin’s paper to misunderstanding another paper invoked by Cranor and by Milward’s lawyers. Carelessly tracking Cranor, the appellate court suggested that there was no “hierarchy of evidence”:

“For example, when a group from the National Cancer Institute was asked to rank the different types of evidence, it concluded that ‘‘[t]here should be no such hierarchy.’’ Michele Carbon [sic] et al., Modern Criteria to Establish Human Cancer Etiology, 64 Cancer Res. 5518, 5522 (2004); see also Sheldon Krimsky, The Weight of Scientific Evidence in Policy and Law, 95 Am. J. Pub. Health S129, S130 (2005).”[7]

This quoted language from the Milward opinion shows how slavishly and credulously the court adopted and regurgitated plaintiff’s argument. Sheldon Krimky was actively involved with SKAPP, and his article was presented at the SKAPP-funded Coronado Conference, discussed earlier in this series. Krimsky actually acknowledged that although “the term [WOE] is applied quite liberally in the regulatory literature, the methodology behind it is rarely explicated.”

As for the article by Carbon [sic], this publication never rejected a hierarchy of evidence. The court’s language, quoted above, follows immediately after the court’s discussion of Sir Austin’s nine types of corroborating evidence that would support the causal interpretation of an association. As such, the court seems to imply, incorrectly, that there was no hierarchy of these considerations.[8]

The court’s language also suggests that the quoted language came from the National Cancer Institute (NCI), but its provenance is quite different. The cited article’s lead author, Michele Carbone (not Carbon), was reporting on a workshop hosted by the NCI at an NCI building; it was not an official NCI event or publication. The NCI did sponsor or conduct the meeting, and Carbone’s paper was not an official statement of the NCI. Carbone’s paper was styled “Meeting Report,” and published as a paid advertisement in Cancer Research, not in the Journal of the National Cancer Institute as a scholarly article.

The discipline of epidemiology was not strongly represented at the meeting; most of the chairpersons and scientists in attendance were pathologists, cell biologists, virologists, and toxicologists. The authors of the meeting report reflect the interests and focus of the scientists in attendance. The lead author, Michele Carbone, a pathologist at the University of Hawaii, was an enthusiastic proponent of Simian Virus 40 as a cause of mesothelioma, a hypothesis that has not fared terribly well in the crucible of epidemiologic science.

The cited article did report some suggestions for modifying Bradford Hill’s criteria in the light of modern molecular biology, as well as a sense of the group that there was no “hierarchy” in which epidemiology was at the top of disciplines.  The group definitely did not address the established concept that some types of epidemiologic studies are analytically more powerful to support inferences of causality than others — the hierarchy of epidemiologic evidence. The group also did not address or reject a ranking of importance of Bradford Hill’s nine viewpoints. There was nothing remarkable about the tumor biologists’ statement that in some cases causality can be determined by careful identification of genetic inheritance or molecular biological pathways. There was no evidence of this sort in the Milward case, and the citation by Cranor and Milward’s lawyers was nothing more than hand waving.

Carbone’s meeting report summarizes informal discussion sessions at the 2003 meeting.  Those in attendance broke out into two groups, one chaired by Brook Mossman, a pathologist, and the other group chaired by Dr. Harald zur Hausen, a virologist. The meeting report included a narrative of how the two groups responded to twelve questions. Drawing from plaintiff’s (and Cranor’s) argument, the court’s citation to this meeting report is based upon one sentence in Carbone’s report, about one of twelve questions:

6. What is the hierarchy of state-of-the-art approaches needed for confirmation criteria, and which bioassays are critical for decisions: epidemiology, animal testing, cell culture, genomics, and so forth?

There should be no such hierarchy. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.”[9]

Considering the fuller context of the meeting, there is nothing particularly surprising about this statement.  The full question and answer in the meeting report does not even remotely support the weight given to it by the court. There was quite a bit of disagreement among meeting participants over criteria for different kinds of carcinogens, as seen the report on another question:

“2. Should the criteria be the same for different agents (viruses, chemicals, physical agents, promoting agents versus initiating DNA-damaging agents)?

There were different opinions. Group 1 debated this issue and concluded that the current listing of criteria should remain the same because we lack sufficient evidence to develop a separate classification. Group 2 strongly supported the view that it is useful to separate the biological or infectious agents from chemical and physical carcinogens due to their frequently entirely different mode of action.”[10]

Carbone and the other authors of the meeting report noted the importance to epidemiology for general causation, while acknowledging its limitations for determining specific causation:

“Concerning the respective roles of epidemiology and molecular pathology, it was noted that epidemiology allows the determination of the overall effect of a given carcinogen in the human population (e.g., hepatitis B virus and hepatocellular carcinoma) but cannot prove causality in the individual tumor patient.”[11]

Clearly, the report was not disavowing the necessity for epidemiology to confirm carcinogenicity in humans. Specific causation of Mr. Milward’s APML was irrelevant to his first appeal to the First Circuit. Carbone’s report emphasized the need to integrate epidemiologic findings with molecular biology; it did not suggest that epidemiology was not necessary or urge that epidemiology be ignored or disregarded:

“A general consensus was often reached on several topics such as the need to integrate molecular pathology and epidemiology for a more accurate and rapid identification of human carcinogens.”[12]

                 * * * * *

“Ideally, before labeling an agent as a human carcinogen, it is important to have epidemiological, experimental animals, and mechanistic evidence (molecular pathology).”[13]

The court’s implication that there was “no hierarchy of evidence” is unsupported by the meeting report. The suggestion that WOE allows some loosey-goosey, ad hoc, unstructured assessment of diverse lines of evidence is rejected in the meeting report with a careful admonition about the lack of validity of some animal models and mechanistic research:

“Moreover, carcinogens and anticarcinogens can have different effects in different situations. As shown by the example of addition of β-carotene in the diet, β- carotene has chemopreventive effects in many experimental systems, yet it appears to have increased the incidence of lung cancer in heavy smokers. Animal experiments can be very useful in predicting the carcinogenicity of a given chemical. However, there are significant differences in susceptibility among species and within organs in the same species, and differences in the metabolic pathway of a given chemical among human and animals could lead to error.”[14]

Inference to the Best Explanation

The First Circuit asserted that “no serious argument can be made that the weight of the evidence approach is inherently unreliable.”[15] As discussed above, this assertion is demonstrably false. In his testimony at the Rule 702 pre-trial hearing, Cranor classified WOE as based upon “inference to the best explanation,” and the First Circuit obsequiously accepted this claim. In articulating and accepting Cranor’s reduction of scientific method to IBE, the appellate court seemed unaware that IBE as an epistemic theory has been roundly criticized. In a very general sense, IBE draws on Charles Pierce’s description of abduction as a mode of reasoning, although many writers have been eager to distinguish abduction from IBE. Bas van Fraassen criticized IBE as lacking merit as a mode of argument in a way germane to Cranor’s presentation of the notion, and the First Circuit’s uncritical acceptance:

“As long as the pattern of Inference to the Best Explanation—henceforth, IBE—is left vague, it seems to fit much rational activity. But when we scrutinize its credentials, we find it seriously wanting.”[16]

The IBE approach raises thorny problems of knowing how to discern the best explanation, or how to tell whether an explanation is simply the best of a bad lot. Other philosophers of science have questioned why explanatoriness should matter as opposed to predictive ability and resistance to falsification upon severe or robust testing.

In the hands of Smith and Cranor, these philosophical quandries become largely beside the point. For Smith and Cranor IBE becomes telling just so stories, which transform “but for” causation into “could be” causation. Drawing directly from Cranor, the Circuit Court explained that an inference to the best explanation involves six general steps for scientists:

“(1) identify an association between an exposure and a disease,

(2) consider a range of plausible explanations for the association,

(3) rank the rival explanations according to their plausibility,

(4) seek additional evidence to separate the more plausible from the less plausible explanations,

(5) consider all of the relevant available evidence, and

(6) integrate the evidence  using professional judgment to come to a conclusion about the best explanation.”[17]

Of course assessing causation requires judgment, but Cranor and Smith radically abridge the process of judging by eliminating:

  • the robust testing of, and attempts to falsify, hypotheses,
  • the weighting of study designs,
  • the pre-specification of kinds of studies to be included or excluded, the assignment of weights to different kinds and qualities of studies, and
  • the pre-specification of criteria of study validity, experimental design, consistency, and exposure-response.

The vague, contentless IBE and WOE, in the hands of Smith, operates just as van Fraassen anticipated. With Cranor’s “philosophizing,” IBE creates a permission structure to reach any desired conclusion. Indeed, Cranor’s approach makes no allowance for when careful scientists withhold judgment because the evidence is inadequate to the task. Furthermore, Cranor’s approach and the Milward decision would cheerily approve cherry picking of studies and data within studies, post hoc weighing of evidence, and even fabricating and rejiggering of evidence, all of which was on display in Smith’s for-litigation opinion.

The First Circuit uttered its mantra of approval of Smith’s scientific delicts in language that became the target of the revision of Rule 702 in 2023:

“the alleged flaws identified by the [district] court go to the weight of Dr. Smith’s opinion, not its admissibility. There is an important difference between what is unreliable support and what a trier of fact may conclude is insufficient support for an expert’s conclusion.”[18]

Earlier in its opinion, the appellate court quoted from the version of Rule 702 in effect when it heard the appeal:

“if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the witness has applied the principles and methods reliably to the facts of the case.”[19]

Sufficiency, reliability, and validity were all preliminary questions to be decided by the court as part of its gatekeeping responsibility.  The appellate court simply ignored the law in its decision to green light Smith’s testimony.

                    (to be continued)


[1] Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied sub nom., U.S. Steel Corp. v. Milward, 565 U.S. 1111 (2012).

[2] Austin Bradford Hill, The Environment and Disease: Association or Causation?, 58 PROC. ROYAL SOC’Y MED. 295 (1965).

[3] Milward, 639 F.3d at 17.

[4] Id. at 295.

[5] See Frank C. Woodside, III & Allison G. Davis, The Bradford Hill Criteria: The Forgotten Predicate, 35 THOMAS JEFFERSON L. REV. 103 (2013).

[6] Milward, 639 F.3d at 17.

[7] Id. (internal citations omitted).

[8] The Reference Manual chapter on medical testimony carefully discusses the hierarchy of evidence as it factors into the assessment of medical causation. John B. Wong, Lawrence O. Gostin & Oscar A. Cabrera, Reference Guide on Medical Testimony, in National Academies of Sciences, Engineering and Medicine & Federal Judicial Center, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 687, 723 -24 (2011); John B. Wong, Lawrence O. Gostin, & Oscar A. Cabrera, Reference Guide on Medical Testimony, in National Academies of Sciences, Engineering and Medicine & Federal Judicial Center, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 1105, 1150-52 (4th ed. 2025). Interestingly, the chapter on epidemiology in the third edition of the Reference Manual cited to the Carbone workshop with apparent approval, but the same chapter in the fourth edition has dropped the reference. Compare Michael D. Green, D. Michal Freedman & Leon Gordis, Reference Guide on Epidemiology, in National Academies of Sciences, Engineering and Medicine & Federal Judicial Center, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 549, 564 n.48 (3rd ed. 2011) with Steve C. Gold, Michael D. Green, Jonathan Chevrier, & Brenda Eskenazi, Reference Guide on Epidemiology, in National Academies of Sciences, Engineering and Medicine & Federal Judicial Center, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 897 (4th ed. 2025).

[9] Carbone at 5522.

[10] Carbone at 5521.

[11] Carbone at 5518 (emphasis added).

[12] Carbone at 5518.

[13] Carbone at 5519.

[14] Carbone at 5521.

[15] Milward, 639 F.3d at 18-19.

[16] Bas van Fraassen, LAWS AND SYMMETRY 131 (1989).

[17] Milward, 639 F.3d at 18.

[18] Milward, 639 F.3d at 22.

[19] Milward, 639 F.3d at 14.